Search CORE

55 research outputs found

WheaCha: A Method for Explaining the Predictions of Models of Code

Author: Wang Ke
Wang Linzhang
Wang Yu
Publication venue
Publication date: 12/07/2022
Field of study

Attribution methods have emerged as a popular approach to interpreting model predictions based on the relevance of input features. Although the feature importance ranking can provide insights of how models arrive at a prediction from a raw input, they do not give a clear-cut definition of the key features models use for the prediction. In this paper, we present a new method, called WheaCha, for explaining the predictions of code models. Although WheaCha employs the same mechanism of tracing model predictions back to the input features, it differs from all existing attribution methods in crucial ways. Specifically, WheaCha divides an input program into "wheat" (i.e., the defining features that are the reason for which models predict the label that they predict) and the rest "chaff" for any prediction of a learned code model. We realize WheaCha in a tool, HuoYan, and use it to explain four prominent code models: code2vec, seq-GNN, GGNN, and CodeBERT. Results show (1) HuoYan is efficient - taking on average under twenty seconds to compute the wheat for an input program in an end-to-end fashion (i.e., including model prediction time); (2) the wheat that all models use to predict input programs is made of simple syntactic or even lexical properties (i.e., identifier names); (3) Based on wheat, we present a novel approach to explaining the predictions of code models through the lens of training data

arXiv.org e-Print Archive

Infrared: A Meta Bug Detector

Author: Wang Linzhang
Wang Yu
Zhang Chi
Publication venue
Publication date: 18/09/2022
Field of study

The recent breakthroughs in deep learning methods have sparked a wave of interest in learning-based bug detectors. Compared to the traditional static analysis tools, these bug detectors are directly learned from data, thus, easier to create. On the other hand, they are difficult to train, requiring a large amount of data which is not readily available. In this paper, we propose a new approach, called meta bug detection, which offers three crucial advantages over existing learning-based bug detectors: bug-type generic (i.e., capable of catching the types of bugs that are totally unobserved during training), self-explainable (i.e., capable of explaining its own prediction without any external interpretability methods) and sample efficient (i.e., requiring substantially less training data than standard bug detectors). Our extensive evaluation shows our meta bug detector (MBD) is effective in catching a variety of bugs including null pointer dereference, array index out-of-bound, file handle leak, and even data races in concurrent programs; in the process MBD also significantly outperforms several noteworthy baselines including Facebook Infer, a prominent static analysis tool, and FICS, the latest anomaly detection method

arXiv.org e-Print Archive

Finding Cross-rule Optimization Bugs in Datalog Engines

Author: Rigger Manuel
Wang Linzhang
Zhang Chi
Publication venue
Publication date: 20/02/2024
Field of study

Datalog is a popular and widely-used declarative logic programming language. Datalog engines apply many cross-rule optimizations; bugs in them can cause incorrect results. To detect such optimization bugs, we propose an automated testing approach called Incremental Rule Evaluation (IRE), which synergistically tackles the test oracle and test case generation problem. The core idea behind the test oracle is to compare the results of an optimized program and a program without cross-rule optimization; any difference indicates a bug in the Datalog engine. Our core insight is that, for an optimized, incrementally-generated Datalog program, we can evaluate all rules individually by constructing a reference program to disable the optimizations that are performed among multiple rules. Incrementally generating test cases not only allows us to apply the test oracle for every new rule generated-we also can ensure that every newly added rule generates a non-empty result with a given probability and eschew recomputing already-known facts. We implemented IRE as a tool named Deopt, and evaluated Deopt on four mature Datalog engines, namely Souffl\'e, CozoDB,

\mu

Z, and DDlog, and discovered a total of 30 bugs. Of these, 13 were logic bugs, while the remaining were crash and error bugs. Deopt can detect all bugs found by queryFuzz, a state-of-the-art approach. Out of the bugs identified by Deopt, queryFuzz might be unable to detect 5. Our incremental test case generation approach is efficient; for example, for test cases containing 60 rules, our incremental approach can produce 1.17

\times

(for DDlog) to 31.02

\times

(for Souffl\'e) as many valid test cases with non-empty results as the naive random method. We believe that the simplicity and the generality of the approach will lead to its wide adoption in practice.Comment: The ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications (2024), Pasadena, California, United State

arXiv.org e-Print Archive

Automatic Detection, Validation and Repair of Race Conditions in Interrupt-Driven Embedded Software

Author: Gao Fengjuan
Li Xuandong
Wang Ke
Wang Linzhang
Wang Yu
Yu Tingting
Zhao Jianhua
Publication venue
Publication date: 28/05/2023
Field of study

Interrupt-driven programs are widely deployed in safety-critical embedded systems to perform hardware and resource dependent data operation tasks. The frequent use of interrupts in these systems can cause race conditions to occur due to interactions between application tasks and interrupt handlers (or two interrupt handlers). Numerous program analysis and testing techniques have been proposed to detect races in multithreaded programs. Little work, however, has addressed race condition problems related to hardware interrupts. In this paper, we present SDRacer, an automated framework that can detect, validate and repair race conditions in interrupt-driven embedded software. It uses a combination of static analysis and symbolic execution to generate input data for exercising the potential races. It then employs virtual platforms to dynamically validate these races by forcing the interrupts to occur at the potential racing points. Finally, it provides repair candidates to eliminate the detected races. We evaluate SDRacer on nine real-world embedded programs written in C language. The results show that SDRacer can precisely detect and successfully fix race conditions.Comment: This is a draft version of the published paper. Ke Wang provides suggestions for improving the paper and README of the GitHub rep

arXiv.org e-Print Archive

Model-Based Security Testing

Author: A. Takanen
Alexander K. Petrenko
Barton P. Miller
David Basin
F. Y. Gu Tian-yang Shi Yin-sheng & Yuan
Guido Wimmel
Holger Schlingloff
Ida Hogganvik
Ina Schieferdecker
Jan Jürjens
Jan Jürjens
Jan Jürjens
Jan Jürjens
Juergen Grossmann
K.A. Reay
Linzhang Wang
M. S. Lund
Mark Blackburn
Martin Schneider
Martin Weiglhofer
Matthias Güdemann
Paul Baker
Paul Gerrard
Rauli Kaksonen
Sjouke Mauw
Tejeddine Mouelhi
W E Vesely
Publication venue: 'Open Publishing Association'
Publication date: 01/02/2012
Field of study

Security testing aims at validating software system requirements related to security properties like confidentiality, integrity, authentication, authorization, availability, and non-repudiation. Although security testing techniques are available for many years, there has been little approaches that allow for specification of test cases at a higher level of abstraction, for enabling guidance on test identification and specification as well as for automated test generation. Model-based security testing (MBST) is a relatively new field and especially dedicated to the systematic and efficient specification and documentation of security test objectives, security test cases and test suites, as well as to their automated or semi-automated generation. In particular, the combination of security modelling and test generation approaches is still a challenge in research and of high interest for industrial applications. MBST includes e.g. security functional testing, model-based fuzzing, risk- and threat-oriented testing, and the usage of security test patterns. This paper provides a survey on MBST techniques and the related models as well as samples of new methods and tools that are under development in the European ITEA2-project DIAMONDS.Comment: In Proceedings MBT 2012, arXiv:1202.582

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

The Geochemical Features and Genesis of Ferromanganese Deposits from Caiwei Guyot, Northwestern Pacific Ocean

Author: Linzhang Wang
Zhigang Zeng
Publication venue: 'MDPI AG'
Publication date: 09/09/2022
Field of study

The ferromanganese deposit is a type of marine mineral resource rich in Mn, Fe, Co, Ni, and Cu. Its growth process is generally multi-stage, and the guyot environment and seawater geochemical characteristics have a great impact on the growth process. Here, we use a scanning electron microscope, X-ray diffraction (XRD), inductively coupled plasma optical emission spectrometer (ICP-OES), X-ray fluorescence (XRF), and inductively coupled plasma mass spectrometry (ICP-MS) to test and analyze the texture morphology, microstructure, mineralogical features, geochemical features of ferromanganese crusts deposits at different distribution locations on Caiwei Guyot. The ferromanganese deposits of Caiwei Guyot are ferromanganese nodules on the slope and board ferromanganese crusts on the mountaintop edge, which are both of hydrgenetic origin. Hydrgenetic origin reflects that the metal source is oxic seawater. Global palaeo-ocean events control the geochemistry compositions and growth process of ferromanganese crusts and the nodule. Ferromanganese crusts that formed from the late Cretaceous on the mountaintop edge have a rough surface with black botryoidal shapes, showing an environment with strong hydrodynamic conditions, while the ferromanganese nodule that formed from the Miocene on the slope has an oolitic surface as a result of water depth. What is more, nanoscale or micron-scale diagenesis may occur during the growth process, affecting microstructure, mineralogical and geochemical features

Multidisciplinary Digital Publishing Institute

The Geochemical Features and Genesis of Ferromanganese Deposits from Caiwei Guyot, Northwestern Pacific Ocean

Author: Linzhang Wang
Zhigang Zeng
Publication venue: MDPI AG
Publication date: 01/09/2022
Field of study

Directory of Open Access Journals